Method and system for double-end talk detection, and method and system for echo elimination

ABSTRACT

A method and system for eliminating echo in a speaker-microphone communication system are provided. The method includes the steps of: performing a noise energy estimating process on a local microphone signal in order to obtain an estimated noise signal, wherein the local microphone signal includes a local voice signal, possible background noise signal, and possible remote voice signal output from a speaker and received by a microphone; performing an echo estimating process on a remote voice signal to obtain an estimated echo signal, wherein the remote voice signal is output from the speaker; determining an error signal from the local microphone signal and the estimated echo signal; calculating a variance (σe2) of the error signal and a variance ({circumflex over (σ)}n2) of the estimated noise signal; calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σe2) of the error signal and variance ({circumflex over (σ)}n2) of the estimated noise signal; and comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred, otherwise, it is determined that double-end talk has occurred.

CROSS REFERENCE TO RELATED APPLICATIONS

This Application claims priority of China Patent Application No. 200910224949.9, filed on Nov. 26, 2009, the entirety of which is incorporated by reference herein.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to methods and systems for echo elimination, and in particular, relates to methods and systems for double-end talk detection in a communication system.

2. Description of the Related Art

Recently, a hands free communication system, which comprises microphone and speaker and is implemented in a mobile phone and a phone conference device, has become popular. During a hands free communication process, there are at least two terminals performing communication using a hands free communication system. Generally, voice from one of the at least two terminals is referred to as local voice, and voice from the other terminal is referred to as remote voice. The remote voice signals are output by a speaker of the local terminal, wherein part of the remote voice signals are input into a microphone of the local terminal, and are inadvertently transmitted to other remote terminals. Accordingly, echo of the remote voice is generated, and the user may thus hear his/her own voice and echo of other people's voices. In this situation, the echo should be eliminated or cancelled.

In addition, thanks to improvements in modern technology, both output gain of a speaker and input gain of a microphone have substantially increased. For some communication devices, a microphone can receive sound from a speaker without switching to a hands free mode. Accordingly, echo elimination is also required.

For a general echo eliminator, an adaptive filter is utilized for estimating an echo route and composing estimated echo signals to eliminate the estimated echo signals. When voice is transmitted by either of the local terminal or the remote terminal, i.e., single-end talk, it is easy for the adaptive filter to estimate the echo route (for example, the room impulse response value h remains practically constant). When voice is transmitted by both of the local terminal and the remote terminal, i.e., double-end talk, the signals input from the local microphone (also referred to as local microphone signal) may comprise not only an echo of the remote voice, but also local voice and background noise. When double-end talk occurs, the filter coefficient of the adaptive filter strays from the real echo route impulse response. Therefore, adaptation of the coefficient of the adaptive filter should be disabled; otherwise, an incorrect estimation of the echo route would be obtained, and the echo elimination effect would be downgraded accordingly. A double talk detector is used for detecting double-end talk, and adaptation of the coefficient of the adaptive filter is disabled, accordingly. Therefore, double talk detection is important for echo elimination.

There are several methods for double-end talk detection. For example, in “Integrated Echo and Noise Canceller for Hands-Free Applications” (IEEE TRANSACTIONS ON CIRCUITS AND STEMS-II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 3, 2002, March), Seon Joon Park teaches a method for detecting double-end talk by calculating correlation coefficients between signals input from microphone and estimated echo signals, calculating correlation coefficients between signals input from microphone and the error signals (i.e., difference between signals input from microphone and estimated echo signals), and comparing the calculated results with thresholds.

More specifically, the conventional double-end talk detection in a communication system is described, as shown in FIG. 3. A double-end talk detection device calculates a correlation coefficient ρ_(mŷ)(k) between signals input from a microphone m(k) and estimated echo signals ŷ(k) and calculates a correlation coefficient ρ_(me)(k) between signals input from microphone m(k) and e(k) error signals (i.e., m(k)−ŷ(k)) according to equations (1) and (2).

$\begin{matrix} {{{\rho_{m\hat{y}}(k)} = \frac{P_{m\hat{y}}(k)}{\sqrt{{P_{m}(k)} \cdot {P_{\hat{y}}(k)}}}},} & {\;{{equation}\mspace{14mu}(1)}} \\ {{{\rho_{m\; e}(k)} = \frac{P_{m\; e}(k)}{\sqrt{{P_{m}(k)} \cdot {P_{e}(k)}}}},} & {{equation}\mspace{14mu}(2)} \end{matrix}$

The P_(mŷ)(k) represents correlation power between m(k) and ŷ(k), P_(me)(k) represents correlation power between m(k) and e(k), P_(m)(k) and represent power of m(k), P_(ŷ)(k) represents power of ŷ(k), and P_(e)(k) represents power of e(k).

In a simulation, the value of |ρ_(mŷ)(k)| approaches to 0 for a double-end talk, and the value of |ρ_(mŷ)(k)| approaches to 1 for a single-end talk. In addition, the value of |ρ_(me)(k)| approaches to 1 for a double-end talk. Therefore, there are two threshold values T1 (for example, ‘0.19’ near to 0) and T2 (0.9 near to 1). The |ρ_(mŷ)(k)| and threshold value T1 are compared, an the |ρ_(me)(k)| and threshold value T2 are compared. When |ρ_(mŷ)(k)|<T1 and |ρ_(me)(k)|>T2, the double-end talk detection device determines that double-end talk has occurred; otherwise, the double-end talk detection device determines that single-end talk has occurred or at least double-end talk has not occurred. According to this method, however, in reality, since the transition time periods for switching from double-end talk to single-end talk and for switching from a single-end talk to a double-end talk differ, it is difficult to select suitable thresholds T1 and T2 to correctly detect occurrence of double-end talk. In addition, because the double-end talk detecting device ignores the influence of noise and non-linear echo routes, performance of the double-end talk detecting device decreases when noise and non-linear echo routes exist under a speaker-microphone environment.

Consequently, there is a need for new and effective methods and systems for detecting double-end talk correctly and eliminating echo.

BRIEF SUMMARY OF THE INVENTION

A detailed description is given in the following embodiments with reference to the accompanying drawings.

According to an aspect of the present invention, a method for eliminating echo in a speaker-microphone communication system is provided, comprising the steps of: performing a noise energy estimating process on a local microphone signal in order to obtain an estimated noise signal, wherein the local microphone signal comprises a local voice signal, possible background noise signal, and possible remote voice signal output from a speaker and received by a microphone; performing an echo estimating process on a remote voice signal to obtain an estimated echo signal, wherein the remote voice signal is output from the speaker; determining an error signal from the local microphone signal and the estimated echo signal; calculating a variance (σ_(e) ²) of the error signal and a variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal; calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of the error signal and variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal; and comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred; otherwise, it is determined that double-end talk has occurred.

It is preferred that the determinant ξ=c·|σ_(e) ²−{circumflex over (σ)}_(n) ²|, and ‘c’ is a constant.

It is preferred that the determinant

${0 \leq \xi} = {1 - {{\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}}.}}$

It is preferred that the method further comprises: calculating a variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal, wherein the determinant

$\xi = {{\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}.}$

It is preferred that the variance (σ_(e) ²) of the error signal, the variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, and/or the variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal are calculated for a whole frequency domain or for a low frequency domain.

It is preferred that the determinant (ξ) is calculated for each frame, and the error signal of a current frame, the estimated noise signal of the current frame, and the estimated echo signal of the current frame are transferred, using N point Fast Fourier Transfer, to corresponding frequency domain signals, respectively, and the variance (σ_(e) ²) of the error signal, variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, or variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal are calculated according to the following equations:

${\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{E_{j}(i)}}}}$ ${{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{Noise}_{j}(i)}}}}$ ${{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{\hat{Y}}_{j}(i)}}}}},$

wherein σ_(e) ²(i) is the variance of the error signal in the i^(th) frame, {circumflex over (σ)}_(n) ²(i) is the variance of the estimated noise signal in the i^(th) frame, {circumflex over (σ)}_(y) ²(i) is the variance of the estimated echo signal in the i^(th) frame, E_(j)(i) represents energy at frequency j of the frequency domain signal of the error signal in the i^(th) frame, Noise_(j)(i) represents energy at frequency j of the frequency domain signal of the noise signal in the i^(th) frame, and Ŷ_(j)(i) represents energy at frequency j of the frequency domain signal of the estimated echo signal in the i^(th) frame, wherein 0.9<λ<1, λ is a real number, 0<P≦N/2−1, P is a positive integral, and the value of P is determined by the non-linearity of the communication system.

It is preferred that when it is determined that double-end talk has occurred, the adaptation of adaptive filter is disabled.

According to another aspect of the present invention, a system for eliminating echo in a speaker-microphone communication system is provided, comprising: a noise energy estimating device, performing a noise energy estimating process on a local microphone signal in order to obtain an estimated noise signal, wherein the local microphone signal comprises a local voice signal, possible background noise signal, and possible remote voice signal output from a speaker and received by a microphone; an echo estimating device, performing an echo estimating process on a remote voice signal to obtain an estimated echo signal, wherein the remote voice signal is output from the speaker; an error signal determining device, determining an error signal from the local microphone signal and the estimated echo signal; a determinant calculator, calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of the error signal and variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal; and a double-end talk detecting device, comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred; otherwise, it is determined that double-end talk has occurred.

It is preferred that the determinant calculator calculates the determinant as ξ=c·|σ_(e) ²−{circumflex over (σ)}_(n) ²|, and ‘c’ is a constant.

It is preferred that the determinant calculator calculates the determinant as

${0 \leq \xi} = {1 - {{\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}}.}}$

It is preferred that the determinant calculator further performs the step of: calculating a variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal, wherein the determinant

$\xi = {{\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}.}$

It is preferred that the determinant calculator calculates variance (σ_(e) ²) of the error signal, the variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, and/or the variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal for a whole frequency domain or for a low frequency domain.

It is preferred that the determinant calculator calculates the determinant (ξ) for each frame, and transfers, using N point Fast Fourier Transfer, the error signal of a current frame, the estimated noise signal of the current frame, and the estimated echo signal of the current frame to corresponding frequency domain signals, respectively, and calculates the variance (σ_(e) ²) of the error signal, variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, or variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal according to the following equations:

${\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{E_{j}(i)}}}}$ ${{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{Noise}_{j}(i)}}}}$ ${{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{\hat{Y}}_{j}(i)}}}}},$

wherein σ_(e) ²(i) is the variance of the error signal in the i^(th) frame, {circumflex over (σ)}_(n) ²(i) is the variance of the estimated noise signal in the i^(th) frame, {circumflex over (σ)}_(y) ²(i) is the variance of the estimated echo signal in the i^(th) frame, E_(j)(i) represents energy at frequency j of the frequency domain signal of the error signal in the i^(th) frame, Noise_(j)(i) represents energy at frequency j of the frequency domain signal of the noise signal in the i^(th) frame, and Ŷ_(j)(i) represents energy at frequency j of the frequency domain signal of the estimated echo signal in the i^(th) frame, wherein 0.9<λ<1, λ is a real number, 0<P≦N/2−1, P is a positive integral, and the value of P is determined by the non-linearity of the communication system.

It is preferred that when the double-end talk detecting device determines that a double-end talk has occurred, the echo estimating adaptation process of the echo estimating coefficient is disabled.

According to another aspect of the present invention, a circuit for eliminating echo in a speaker-microphone communication system is provided. The system comprises a noise energy estimating device, an echo estimating device, and an error signal determining device. The circuit comprises a determinant calculator and a double-end talk detecting device. The determinant calculator is configured for calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of an error signal from the error signal determining device and variance ({circumflex over (σ)}_(n) ²) of an estimated noise signal from the noise energy estimating device. The double-end talk detecting device is configured for comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred, otherwise, it is determined that double-end talk has occurred.

The present invention determines whether a double-end talk has occurred by determining whether a determinant (ξ) corresponding to a variance (σ_(e) ²) of an error signal and a variance ({circumflex over (σ)}_(n) ²) of an estimated noise signal during a communication process is near a preset threshold. The present invention further considers influence of non-linear echo when double-end talk has occurred in order to clear up high frequency influence of non-linear echo; thereby correctly determining whether a double-end talk has occurred. Furthermore, for double-end talk occurrences, an echo estimating adaptation process/device (such as an adaptive filtering method/device for an adaptive coefficient) is disabled; thereby enabling efficient echo elimination.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:

FIG. 1 is a schematic view of a communication system of an echo eliminating system of the present invention;

FIG. 2 is a flowchart of an echo eliminating method of the present invention; and

FIG. 3 is a schematic view of a communication system of a conventional echo eliminating system.

DETAILED DESCRIPTION OF THE INVENTION

The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.

Referring to FIG. 1, which is a schematic view of a communication system of an echo eliminating system of the present invention.

The communication system comprises a speaker and a microphone, and an echo eliminating system 1. The echo eliminating system 1 comprises: a noise energy estimating device 11, performing a noise energy estimating process on a local microphone signal m(k) in order to obtain an estimated noise signal, wherein the local microphone signal may comprise: a local voice signal v(k), possible background noise signal n(k) and possible remote voice signal y(k) output from a speaker and received by a microphone; an echo estimating device 12, performing an echo estimating process on a remote voice signal to obtain an estimated echo signal, wherein the remote voice signal is output from the speaker; an error signal determining device 13, determining an error signal e(k) from the local microphone signal and the estimated echo signal; a determinant calculator 14, calculating a determinant (ξ) wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of the error signal and variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal; and a double-end talk detecting device 15, comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred; otherwise, it is determined that double-end talk has occurred.

The described devices 11-15 can realize double-end talk detection. In addition to the devices 11-15, the echo eliminating system 1 further comprises: a remote voice signal detecting device (not shown), detecting a remote voice signal x(k) output from the speaker; and a local microphone signal receiving device (not shown), receiving a local microphone signal m(k) from the microphone. When no remote voice signal is detected by the remote voice signal detecting device and no microphone signal is received by the local microphone signal receiving device, the double-end talk detecting device 15 can determine that double-end talk has not occurred. Furthermore, when the double-end talk detecting device 15 determines that double-end talk has occurred, the echo estimating coefficient adaptation process of the echo estimating device is disabled, thereby enabling efficient echo elimination.

More specifically, the remote voice signal x(k) passes through the echo route h (for example, almost a constant such as a room impulse response value h), and the echo signal y(k) is received by the microphone, wherein k is a time index. Consequently, a real echo signal y(k) can be obtained by the following equation (3): y(k)=h ^(T) x  equation (3),

wherein h=[h₀ h₁ . . . h_(L-1)]^(T), x=[x(k) x(k−1) . . . x(k−L+1)]^(T), L represents length of the echo route, h₀h₁ . . . . h_(L-1) are impulse response coefficients.

However, it is difficult to detect the real echo signals. Consequently, it is common to use the echo estimating device 12 of an adaptive filter to estimate the echo) route and generate the estimated echo signal ŷ(k). More specifically, the echo estimating coefficient of the echo estimating device 12, such as the adaptive filtering coefficient, is ĥ. Accordingly, an estimated echo signal ŷ(k) is obtained by performing the echo estimating process on the remote voice signal, i.e., according to the following equation (4): ŷ(k)=ĥ ^(T) x  equation (4).

Because the microphone signal m(k) input from the microphone may comprise a local voice signal v(k), a background noise signal n(k), an echo signal y(k), and the local microphone signal m(k) received by the local microphone signal receiving device can be determined according to the following equation (5): m(k)=y(k)+v(k)+n(k)  equation (5)

Accordingly, an error signal e(k) can be obtained from a difference between the local microphone signal m(k) and the estimated echo signal ŷ(k). The error signal determining device 13 can determine the error signal e(k) according to the following equation (6): e(k)=m(k)−ŷ(k)=m(k)−ĥ ^(T) x  equation (6)

Because it is difficult to detect background noise signal n(k), it is common to use the noise energy estimating device 11 to perform a noise energy estimation to determine estimated noise {circumflex over (n)}(k). It should be noted that, conventionally, there are several methods and devices for noise estimation. Herein, details thereof are not given here.

Accordingly, if there is only a remote voice signal and not a local voice signal, i.e., for occurrence of single-end talk, the local voice signal v(k) is near (or equal) to 0. Accordingly, the error signal e(k) and estimated noise signal {circumflex over (n)}(k) are essentially the same or substantially equivalent to each other.

It can be determined whether the two values are equal or substantially equivalent by calculating a variance σ_(e) ² of the error signal e(k) and a variance {circumflex over (σ)}_(n) ² of the estimated noise signal {circumflex over (n)}(k). Certainly, other methods can be used to determine whether the error signal e(k) and estimated noise signal {circumflex over (n)}(k) are essentially the same. For example, the energy of the error signal and the energy of the noise signal may be calculated.

According to an embodiment of the present invention, the variance σ_(e) ² of the error signal e(k) can be, but is not limited to, a calculation according to the following equation (7):

$\begin{matrix} \begin{matrix} {\sigma_{e}^{2} = {E\left\lbrack {ee}^{T} \right\rbrack}} \\ {= {E\left\lbrack {\left( {{m(k)} - {{\hat{h}}^{T}x}} \right)\left( {{m(k)} - {{\hat{h}}^{T}x}} \right)^{T}} \right\rbrack}} \\ {= {E\left\lbrack {\left( {{h^{T}x} + {v(k)} + {n(k)} - {{\hat{h}}^{T}x}} \right)\left( {{h^{T}x} + {v(k)} + {n(k)} - {{\hat{h}}^{T}x}} \right)^{T}} \right\rbrack}} \\ {= {E\left\lbrack {\left( {{\left( {h^{T} - {\hat{h}}^{T}} \right)x} + {v(k)} + {n(k)}} \right)\left( {{\left( {h^{T} - {\hat{h}}^{T}} \right)x} + {v(k)} + {n(k)}} \right)^{T}} \right\rbrack}} \end{matrix} & {{equation}\mspace{14mu}(7)} \end{matrix}$

The E[*] represents a mathematical expected value.

Since the remote voice signal x(k), local voice signal v(k) and noise signal n(k) are generally independent from each other, the equation (7) can be revised as equation (8):

$\begin{matrix} \begin{matrix} {\sigma_{e}^{2} = {{E\left\lbrack {\left( {h^{T} - {\hat{h}}^{T}} \right){{xx}^{T}\left( {h - \hat{h}} \right)}} \right\rbrack} + {E\left\lbrack v^{2} \right\rbrack} + {E\left\lbrack n^{2} \right\rbrack}}} \\ {= {{\left( {h^{T} - {\hat{h}}^{T}} \right){R_{xx}\left( {h - \hat{h}} \right)}} + \sigma_{v}^{2} + \sigma_{n}^{2}}} \end{matrix} & {{equation}\mspace{14mu}(8)} \end{matrix}$

In equation (8), R_(xx)=[xx^(T)], and σ_(v) ² represents a variance of the local voice signal v(k).

It should be noted that, conventionally, there are several methods and devices for calculating variance σ_(n) ² the noise signal. Here, details thereof are not given here.

When the local voice signal v(k) is near (or equal to) 0, i.e., no local voice signal exists, i.e., when signal-end talk has occurred, the variance σ_(v) ² of the local voice signal v(k) is near (or equal to) 0. In this situation, the variance σ_(e) ² of the error signal should correspond to (near, equal to) the variance {circumflex over (σ)}_(n) ² of the estimated noise signal. Accordingly, when the double-end talk detecting device 15 ascertains that the variance σ_(e) ² of the error signal equals to the variance {circumflex over (σ)}_(n) ² of the estimated noise signal, then it is determined that double-end talk has not occurred, and single-end talk has; otherwise, it is determined that double-end talk has occurred.

In one embodiment, the double-end talk detecting device 15 is configured for comparing the variance σ_(e) ² of the error signal and the variance {circumflex over (σ)}_(n) ² of the estimated noise signal, and for determining that double-end talk has not occurred when the absolute difference value is lower than the preset value; otherwise, it is determined that double-end talk has occurred.

According to an embodiment of the present invention, a determinant ξ for detecting double-end talk can be defined by the equation (9a): ξ=|σ_(e) ²−{circumflex over (σ)}_(n) ²|  equation (9a)

In another embodiment, a constant value can be determined by simulation, and the absolute difference value (|σ_(e) ²−{circumflex over (σ)}σ_(n) ²|) is multiplied by a constant c, and then being compared with the preset threshold. If the product of absolute difference value (|σ_(e) ²−{circumflex over (σ)}_(n) ²|) and the constant c is lower than the preset threshold, then the double-end talk detecting device 15 determines that double-end talk has not occurred; otherwise, it is determined that double-end talk has occurred. Accordingly, a determinant ξ can be defined by the equation (9b): ξ=c·|σ _(e) ²−{circumflex over (σ)}_(n) ²|  equation (9b)

In another embodiment, the double-end talk detecting device 15 may calculate an absolute ratio of a difference between the variance σ_(e) ² of the error signal and the variance {circumflex over (σ)}_(n) ² of the estimated noise signal and the variance {circumflex over (σ)}_(y) ² of the estimated echo signal, and compare the absolute ratio with a preset threshold. When the ratio is lower than the preset threshold, it is determined that double-end talk has not occurred. In this embodiment, the difference between the variance σ_(e) ² of the error signal and the variance {circumflex over (σ)}_(n) ² of the estimated noise signal is divided with the variance {circumflex over (σ)}_(y) ² of the estimated echo signal. Thus, a relative proportional value of the difference between the variance σ_(e) ² of the error signal and the variance {circumflex over (σ)}_(n) ² of the estimated noise signal and the variance {circumflex over (σ)}_(y) ² of the estimated echo signal is obtained. According to the relative proportional value, it can be determined, more precisely, whether the variance σ_(e) ² of the error signal and the variance {circumflex over (σ)}_(n) ² of the estimated noise signal are equal. In other words, when the difference between the variance σ_(e) ² of the error signal and the variance {circumflex over (σ)}_(n) of the estimated noise signal is greater, a more stable proportional value can be obtained by dividing the difference with the variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal (which might be greater, too). Consequently, setting up of the preset threshold is easier for implementing precise detection.

More specifically, a determinant ξ can be defined by the equation (9c):

$\begin{matrix} {\xi = {\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}} & {{equation}\mspace{14mu}\left( {9c} \right)} \end{matrix}$

The σ_(e) ² represents a variance of an error signal, the {circumflex over (σ)}_(n) ² represents a variance of a noise signal, and the {circumflex over (σ)}_(y) ² represents a variance of an estimated echo signal. Accordingly, if ξ is near 0, the double-end talk detecting device 15 determines that double-end talk has not occurred; otherwise, the double-end talk detecting device 15 determines that double-end talk has occurred. More specifically, if the ξ is lower than a preset threshold, the double-end talk detecting device 15 determines that double-end talk has not occurred; otherwise, the double-end talk detecting device 15 determines that double-end talk has occurred.

In another embodiment, a double-end talk detecting device 15 may be configured for comparing a ratio between the variance {circumflex over (σ)}_(n) ² of the estimated noise signal and the variance σ_(e) ² of the error signal with a preset threshold. Since the error signals comprise noise signals, the ratio between the variance {circumflex over (σ)}_(n) ² of the estimated noise signal and the variance σ_(e) ² of the error signal is a value resides between 0 and 1. A determinant ξ can be defined by the equation (9d).

$\begin{matrix} {{0 \leq \xi} = {1 - {\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}}}} & {{equation}\mspace{14mu}\left( {9d} \right)} \end{matrix}$

When there is only a remote voice signal but no local voice signal, i.e., double-end talk has not occurred, the

$\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}$ value is near 1. Thus, the determinant ξ is near 0. When there are local voice signals, the

$\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}$ value might be lower than 1. Thus, the determinant ξ is not near 0. Accordingly, the preset threshold can be determined by simulation, and when the ξ value is lower than the preset threshold, it is determined that double-end talk has not occurred.

According to equations (9a)˜(9d), the determinant ξ relates to the variance σ_(e) ² of the error signal and the variance {circumflex over (σ)}_(n) ² of the estimated noise signal.

In addition, it should be well-known by those of ordinary skill in the art, that a variance σ_(y) ² of real echo signals can be determined by equation (10).

$\begin{matrix} \begin{matrix} {\sigma_{y}^{2} = {E\left\lbrack {yy}^{T} \right\rbrack}} \\ {= {E\left\lbrack {h^{T}{xx}^{T}h} \right\rbrack}} \\ {= {h^{T}R_{xx}h}} \end{matrix} & {{equation}\mspace{14mu}(10)} \end{matrix}$

Furthermore, {circumflex over (σ)}_(y) ²={circumflex over (h)}^(T) R _(xx) ĥ.  equation (11).

Accordingly, the ξ in equation (9c) can be revised as follows.

$\begin{matrix} {\xi = {\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}} \\ {= {\frac{{\left( {h^{T} - {\hat{h}}^{T}} \right){R_{xx}\left( {h - \hat{h}} \right)}} + \sigma_{v}^{2} + \sigma_{n}^{2} - {\hat{\sigma}}_{n}^{2}}{h^{T}R_{xx}h}}} \end{matrix}$

This further illustrates that, when there is no local voice signal, i.e., in equation (12), the σ_(v) ² is near or equal to 0, h^(T) is near ĥ^(T), h is near ĥ, σ_(n) ² is near {circumflex over (σ)}_(n) ², it can be determined that double-end talk has not occurred when the |σ_(e) ²−{circumflex over (σ)}_(n) ²| value in equations (9a)-(9c) is near 0; otherwise, it can be determined that double-end talk has occurred.

The following paragraphs provide discussion concerning calculation of the variance σ_(e) ² of the error signal, the variance {circumflex over (σ)}_(n) ² of the estimated noise signal, and the variance {circumflex over (σ)}_(y) ² of the estimated echo signal.

The variance σ_(e) ² of the error signal, the variance {circumflex over (σ)}_(n) ² of the estimated noise signal, and the variance {circumflex over (σ)}_(y) ² of the estimated echo signal can be calculated over a time domain or a frequency domain. The present invention focuses on calculation over the frequency domain. The determinant ξ can be calculated for each frame.

First, the following is defined. E(i)=[E ₀(i)E ₁(i) . . . E _(N−1)(i)]^(T)

FFT[e(iM)]  equation (13) N(i)=[Noise₀(i)Noise₁(i) . . . Noise_(N−1)(i)]^(T)

FFT[n(iM)]  equation (14) {circumflex over (Y)}(i)=[{circumflex over (Y)}₀(i){circumflex over (Y)}₁(i) . . . Ŷ _(N−1)(i)]^(T)

FFT[ŷ(iM)]  equation (15)

The ‘i’ represents a frame index, M represents a frame size, N represents a window size for fast Fourier transform, and FFT[*] represents implementing windowed fast Fourier transform by overlapping.

The E(i), N(i), and Ŷ(i) are one-dimensional matrices, and the italic E_(j)(i) represents energy at frequency j after FFT transformation. N outputs are generated from an N point FFT transformation, the E(i) represents an output from the N point FFT transformation of the error signals in the i^(th) frame. Similarly, the N(i) and Ŷ(i) represent outputs from the N point FFT transformation of the estimated noise signals and estimated echo signals in the i^(th) frame.

In an embodiment of the present invention, M is 80, N is 128. Of course, values of the M and N are not limited to the described values.

The described equation can be calculated by the following equations, wherein k is a time index, and if k is substituted by iM, then the e(iM), n(iM), ŷ(iM) are spaced by intervals M (i.e., data for a frame). e(k)

[e(k)e(k−1) . . . e(k−N+1)]^(T) .*[w(0)w(1) . . . w(N−1)]^(T)  equation (16) n(k)

[n(k)n(k−1) . . . n(k−N+1)]^(T) .*[w(0)w(1) . . . w(N−1)]^(T)  equation (17) ŷ(k)

[y(k)y(k−1) . . . y(k−N+1)]^(T) .*[w(0)w(1) . . . w(N−1)]^(T)  equation (18)

The symbols“.*” represents a dot product operation. The w(0) . . . w(N−1) are window functions. It should be well-known by those of ordinary skill in the art, that before transformation from a time domain to a frequency domain, window functions (such as a hanning window) can be added to time domain signals.

The variance σ_(e) ² of the error signal, the variance {circumflex over (σ)}_(n) ² of the estimated noise signal, and the variance {circumflex over (σ)}_(y) ² of the estimated echo signal can be calculated by an index recursive weighted algorithm. More specifically, the algorithm is calculated according to the following equations.

$\begin{matrix} {{\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{j = {\frac{N}{2} - 1}}{E_{j}(i)}}}}} & {{equation}\mspace{14mu}(19)} \\ {{{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{j = {\frac{N}{2} - 1}}{{Noise}_{j}(i)}}}}} & {{equation}\mspace{14mu}(20)} \\ {{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{j = {\frac{N}{2} - 1}}{{\hat{Y}}_{j}(i)}}}}} & {{equation}\mspace{14mu}(21)} \end{matrix}$

In these equations, 0.9<λ<1 and 0<=j<=N−1.

The E_(j)(i) represents energy at j generated from N point FFT transformation of data of the i^(th) frame of the error signals. Similarly, Noise_(j)(i), Ŷ_(j)(i) represent energy at j generated from N point FFT transformation of data of the i^(th) frame of the estimated noise signals and estimated echo signals.

The variance σ_(e) ² of the error signal, the variance {circumflex over (σ)}_(n) ² of the estimated noise signal, and the variance {circumflex over (σ)}_(y) ², of the estimated echo signal can be accordingly calculated, along with the determinant ξ.

According to equations (19)-(21), in a frequency range from j=0 to j=(N/2−1) (i.e., in the whole frequency domain), energy of error signals, energy of estimated noise signals, and energy of estimated echo signals of a current frame can be calculated by calculating sums of E_(j)(i), Noise_(j)(i), and Ŷ_(j)(i), respectively.

Generally, in communication systems implemented for real mobile phone applications, or telephone conferences, a speaker is driven by a maximum voltage to amplify volume of the speaker. Consequently, saturation output occurs in the speaker, resulting in non-linearity. The non-linearity is known as harmonic distortion, i.e., generation of a large amount of harmonic components. In other words, the remote voice signals have low frequency (such as 450 Hz) before they are digital/analog converted and output from the speaker. When the remote voice signals are output from the speaker and received by the microphone, the received local microphone signals comprise not only noise signals, but also other high frequency components, such as 900 Hz, 1350 Hz, 2250 Hz, and 3150 Hz. Conventionally, a linear adaptive filter (an exemplary embodiment of the echo estimating device 12) can estimate linear echo. Accordingly, a linear adaptive filter can eliminate linear echo by subtracting the estimated echo signals generated by the linear adaptive filter from the local microphone signals. In other words, echo at frequency 450 Hz can be eliminated, while non-linear echo (such as echo at frequency 900 Hz, 1350 Hz, 2250 Hz, and 3150 Hz) cannot be eliminated. Generally, speaker signals at frequency f influences residual echo signals at frequency 900 Hz, 1350 Hz, 2250 Hz, and 3150 Hz. In other words, residual non-linear echo signals belong to the high frequency domain. Because multiple harmonic components of low-frequency components overlap in the high frequency domain, in a domain with higher frequencies, the impact of harmonic components is more serious. Consequently, residual high-frequency non-linear echo causes problems for double-end talk detection.

Accordingly, the invention improves upon calculating energy of the error signal, energy of estimated noise signals, and energy of estimated echo signals of a current frame. That is, the calculation can apply only to the frequency components in the low-frequency domain and not the high-frequency domain impacted by harmonic components. As a result, the influence caused by residual high-frequency non-linear echo for double-end talk detection is reduced or eliminated. More specifically, the equations (19), (20), (21) can be revised as follows.

$\begin{matrix} {{\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{E_{j}(i)}}}}} & {{equation}\mspace{14mu}(22)} \\ {{{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{Noise}_{j}(i)}}}}} & {{equation}\mspace{14mu}(23)} \\ {{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{\hat{Y}}_{j}(i)}}}}} & {{equation}\mspace{14mu}(24)} \end{matrix}$

In the described equations, 0<P<N/2−1. The value of P can be determined by non-linearity of the communication system. In other words, according to an original energy calculation, the range 0<j<N/2−1 covers the whole frequency domain (for example, the frequency domain is 0˜4000 Hz at a sampling rate of 8000 Hz); while for the improved energy calculation, 0<j<P, and 0<P<N/2−1. Thus, the calculation is only executed on the frequency components in the low-frequency domain. As a result, the influence caused by residual high-frequency non-linear echo during double-end talk detection is reduced or eliminated, and accuracy of double-end talk detection can be improved, wherein echo elimination can be more perfectly realized.

Referring to FIG. 2, a flowchart of an echo eliminating method of the present invention is illustrated.

First, in step S11, a noise energy estimating process is performed on a local microphone signal in order to obtain an estimated noise signal, wherein the local microphone signal comprises a possible local voice signal, a possible background noise signal, and a possible remote voice signal output from a speaker and received by a microphone.

In step S12, an echo estimating process is performed on a remote voice signal to obtain an estimated echo signal, wherein the remote voice signal is output from the speaker.

In step S13, an error signal is determined from the local microphone signal and the estimated echo signal.

In step S14, a variance (σ_(e) ²) of the error signal and a variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal are calculated.

In step S15, an determinant (ξ) is calculated, wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of the error signal and variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal.

In step S16, the determinant (ξ) and a preset threshold is compared. When the determinant (ξ) is lower than the preset threshold, it is determined in step S17 that double-end talk has not occurred; otherwise, it is determined in step S18 that double-end talk has occurred.

Thus, the method is finished.

In the described method, the remote voice signals output from the speaker and the local microphone signals received by the microphone can be detected first. In a case where no remote voice signal exists or no local microphone signal is received, it can be determined directly that double-end talk has not occurred. Thus, speeding up detection.

In step S15 of the method, in the calculation of the determinant (ξ), the determinant can be calculated as ξ=c·|σ_(e) ²−{circumflex over (σ)}_(n) ²|, and ‘c’ is a constant.

In step S15 of the method, the determinant can be calculated as

${0 \leq \xi} = {1 - {{\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}}.}}$

In step S15 of the method, the determinant can be calculated as

$\xi = {\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}$ by calculating a variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal.

The variance (σ_(e) ²) of the error signal, the variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, and/or the variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal are calculated for a whole frequency domain or for a low frequency domain, and the determinant (ξ) is calculated for each frame.

The error signal of a current frame, the estimated noise signal of the current frame, the estimated echo signal of the current frame are transferred, using N point Fast Fourier Transfer, to corresponding frequency domain signals, respectively, and the variance (σ_(e) ²) of the error signal, variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, or variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal are calculated according to the following equations:

$\begin{matrix} {{\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{E_{j}(i)}}}}} \\ {{{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{Noise}_{j}(i)}}}}} \\ {{{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{\hat{Y}}_{j}(i)}}}}},} \end{matrix}$

wherein σ_(e) ²(i) is the variance of the error signal in the i^(th) frame, {circumflex over (σ)}_(n) ²(i) is the variance of the estimated noise signal in the i^(th) frame, {circumflex over (σ)}_(y) ²(i) is the variance of the estimated echo signal in the i^(th) frame, E_(j)(i) represents energy at frequency j of the frequency domain signal of the error signal in the i^(th) frame, Noise_(j)(i) represents energy at frequency j of the frequency domain signal of the noise signal in the i^(th) frame, and Ŷ_(j)(i) represents energy at frequency j of the frequency domain signal of the estimated echo signal in the i^(th) frame, wherein 0.9<λ<1, λ is a real number, 0<P≦N/2−1, P is a positive integral, and the value of P is determined by the non-linearity of the communication system

In short, the calculation of variance of the error signal, the variance of the estimated noise signal, and/or the variance of the estimated echo signal can be accomplished utilizing the described method or other known methods. Herein, unnecessary details thereof are not given. The preset threshold used for comparison with the determinant can be set by simulation, experience, or manual settings according to the calculation method of the determinant.

When it is determined that double-end talk has occurred according to the system and method of the present invention, the echo estimating adaptation process of the echo estimating coefficient implemented by the adaptive filter (which is an exemplary embodiment of the echo estimating device 12) or in an echo estimating step, is disabled, in order to obtain an accurate estimated echo route for double-end talk occurrences. Accordingly, echo is correctly estimated and eliminated. There are several methods and systems for echo estimation in conventional technology. Herein, unnecessary details thereof are not given.

As described, the echo eliminating method and system of the present invention detects double-end talk by determining whether a determinant (ξ) corresponding to a variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal and a variance (σ_(e) ²) of the error signal approaches to a preset threshold. The present invention further considers influence of non-linear echo when double-end talk has occurred, in order to clear up high frequency influence of non-linear echo, thereby correctly determining whether double-end talk has occurred. Furthermore, for double-end talk occurrences, an echo estimating adaptation process/device (such as an adaptive filtering method/device for an adaptive coefficient) is disabled; thereby enabling efficient echo elimination.

According to another embodiment in accordance of the present invention, a circuit for eliminating echo in a speaker-microphone communication system is provided. The system comprises a noise energy estimating device, an echo estimating device, and an error signal determining device. The circuit comprises a determinant calculator and a double-end talk detecting device. The determinant calculator is configured for calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of an error signal from the error signal determining device and variance ({circumflex over (σ)}_(n) ²) of an estimated noise signal from the noise energy estimating device. The double-end talk detecting device is configured for comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred, otherwise, it is determined that double-end talk has occurred.

It should be understood by those of ordinary skill in the art, that the steps of the described method can be executed in a time sequence or not in a time sequence. Also, steps of the described method can be executed independently or in parallel. The described device can represent a hardware apparatus, a software module, or firmware. The described devices can be separated into more parts or be combined into fewer devices.

While the invention has been described by way of example and in terms of the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements. 

1. A method for eliminating echo in a speaker-microphone communication system, comprising performing a noise energy estimating process on a local microphone signal in order to obtain an estimated noise signal, wherein the local microphone signal comprises a local voice signal, possible background noise signal, and possible remote voice signal output from a speaker and received by a microphone; performing an echo estimating process on a remote voice signal to obtain an estimated echo signal, wherein the remote voice signal is output from the speaker; determining an error signal from the local microphone signal and the estimated echo signal; calculating a variance (σ_(e) ²) of the error signal and a variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal; calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of the error signal and variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal; and comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred, otherwise, it is determined that double-end talk has occurred.
 2. The method as claimed in claim 1, wherein the determinant ξ=c·|σ_(e) ²−{circumflex over (σ)}_(n) ²|, and ‘c’ is a constant.
 3. The method as claimed in claim 1, wherein the determinant ${0 \leq \xi} = {1 - {{\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}}.}}$
 4. The method as claimed in claim 1, further comprising: calculating a variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal, wherein the determinant $\xi = {{\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}.}$
 5. The method as claimed in claim 4, wherein the variance (σ_(e) ²) of the error signal, the variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, and/or the variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal are calculated for a whole frequency domain or for a low frequency domain.
 6. The method as claimed in claim 5, wherein the determinant (ξ) is calculated for each frame, and the error signal of a current frame, the estimated noise signal of the current frame, and the estimated echo signal of the current frame are transferred, using N point Fast Fourier Transfer, to corresponding frequency domain signals, respectively, and the variance (σ_(e) ²) of t the error signal, variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, or variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal are calculated according to the following equations: $\begin{matrix} {{\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{E_{j}(i)}}}}} \\ {{{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{Noise}_{j}(i)}}}}} \\ {{{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{\hat{Y}}_{j}(i)}}}}},} \end{matrix}$ wherein σ_(e) ²(i) is the variance of the error signal in the i^(th) frame, {circumflex over (σ)}_(n) ²(i) is the variance of the estimated noise signal in the i^(th) frame, {circumflex over (σ)}_(y) ²(i) is the variance of the estimated echo signal in the i^(th) frame, E_(j)(i) represents energy at frequency j of the frequency domain signal of the error signal in the i^(th) frame, Noise_(j)(i) represents energy at frequency j of the frequency domain signal of the noise signal in the i^(th) frame, and Ŷ_(j)(i) represents energy at frequency j of the frequency domain signal of the estimated echo signal in the i^(th) frame, wherein 0.9<λ<1, λ is a real number, 0<P≦N/2−1, P is a positive integral, and the value of P is determined by the non-linearity of the communication system.
 7. The method as claimed in claim 1, wherein when it is determined that double-end talk has occurred, the echo estimating adaptation process of the echo estimating coefficient is disabled.
 8. A system for eliminating echo in a speaker-microphone communication system, comprising: a noise energy estimating device, performing a noise energy estimating process on a local microphone signal in order to obtain an estimated noise signal, wherein the local microphone signal comprises a local voice signal, possible background noise signal, and possible remote voice signal output from a speaker and received by a microphone; an echo estimating device, performing an echo estimating process on a remote voice signal to obtain an estimated echo signal, wherein the remote voice signal is output from the speaker; an error signal determining device, determining an error signal from the local microphone signal and the estimated echo signal; a determinant calculator, calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of the error signal and variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal; and a double-end talk detecting device, comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred, otherwise, it is determined that double-end talk has occurred.
 9. The system as claimed in claim 8, wherein the determinant calculator calculates the determinant as ξ=c·|σ_(e) ²−{circumflex over (σ)}_(n) ²|, and ‘c’ is a constant.
 10. The system as claimed in claim 8, wherein the determinant calculator calculates the determinant as ${0 \leq \xi} = {1 - {{\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}}.}}$
 11. The system as claimed in claim 8, wherein the determinant calculator further performs the step of: calculating a variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal, wherein the determinant $\xi = {{\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}.}$
 12. The system as claimed in claim 11, wherein the determinant calculator calculates variance (σ_(e) ²) of the error signal, the variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, and/or the variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal for a whole frequency domain or for a low frequency domain.
 13. The system as claimed in claim 12, wherein the determinant calculator calculates the determinant (ξ) for each frame, and transfers, using N point Fast Fourier Transfer, the error signal of a current frame, the estimated noise signal of the current frame, and the estimated echo signal of the current frame to corresponding frequency domain signals, respectively, and calculates the variance (σ_(e) ²) of the error signal, variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, or variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal according to the following equations: $\begin{matrix} {{\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{E_{j}(i)}}}}} \\ {{{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{Noise}_{j}(i)}}}}} \\ {{{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{\hat{Y}}_{j}(i)}}}}},} \end{matrix}$ wherein σ_(e) ²(i) is the variance of the error signal in the i^(th) frame, {circumflex over (σ)}_(n) ²(i) is the variance of the estimated noise signal in the i^(th) frame, {circumflex over (σ)}_(y) ²(i) is the variance of the estimated echo signal in the i^(th) frame, E_(j)(i) represents energy at frequency j of the frequency domain signal of the error signal in the i^(th) frame, Noise_(j)(i) represents energy at frequency j of the frequency domain signal of the noise signal in the i^(th) frame, and Ŷ_(j)(i) represents energy at frequency j of the frequency domain signal of the estimated echo signal in the i^(th) frame, wherein 0.9<λ<1, λ is a real number, 0<P≦N/2−1, P is a positive integral, and the value of P is determined by the non-linearity of the communication system.
 14. The system as claimed in claim 8, wherein when the double-end talk detecting device determines that double-end talk has occurred, the echo estimating adaptation process of the echo estimating coefficient is disabled.
 15. A circuit for eliminating echo in a speaker-microphone communication system, wherein the system comprises a noise energy estimating device, an echo estimating device, and an error signal determining device, comprising: a determinant calculator, calculating a determinant (ξ), wherein the determinant (ξ) corresponds to the variance (σ_(e) ²) of an error signal from the error signal determining device and variance ({circumflex over (σ)}_(n) ²) of an estimated noise signal from the noise energy estimating device; and a double-end talk detecting device, comparing the determinant (ξ) and a preset threshold, wherein when the determinant (ξ) is lower than the preset threshold, it is determined that double-end talk has not occurred, otherwise, it is determined that double-end talk has occurred.
 16. The circuit of claim 15, wherein the determinant calculator calculates the determinant as ξ=c·|σ_(e) ²−{circumflex over (σ)}_(n) ²|, and ‘c’ is a constant.
 17. The circuit of claim 15, wherein the determinant calculator calculates the determinant as ${0 \leq \xi} = {1 - {{\frac{{\hat{\sigma}}_{n}^{2}}{\sigma_{e}^{2}}}.}}$
 18. The circuit of claim 15, wherein the determinant calculator further performs the step of: calculating a variance ({circumflex over (σ)}_(y) ²) of an estimated echo signal from the echo estimating device, wherein the determinant $\xi = {{\frac{\sigma_{e}^{2} - {\hat{\sigma}}_{n}^{2}}{{\hat{\sigma}}_{y}^{2}}}.}$
 19. The circuit of claim 18, wherein the determinant calculator calculates variance (σ_(e) ²) of the error signal, the variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, and/or the variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal for a whole frequency domain or for a low frequency domain, wherein the determinant calculator calculates the determinant (ξ) for each frame, and transfers, using N point Fast Fourier Transfer, the error signal of a current frame, the estimated noise signal of the current frame, and the estimated echo signal of the current frame to corresponding frequency domain signals, respectively, and calculates the variance (σ_(e) ²) of the error signal, variance ({circumflex over (σ)}_(n) ²) of the estimated noise signal, or variance ({circumflex over (σ)}_(y) ²) of the estimated echo signal according to the following equations: $\begin{matrix} {{\sigma_{e}^{2}(i)} = {{{\lambda\sigma}_{e}^{2}\left( {i - 1} \right)} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{E_{j}(i)}}}}} \\ {{{\hat{\sigma}}_{n}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{n}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{Noise}_{j}(i)}}}}} \\ {{{{\hat{\sigma}}_{y}^{2}(i)} = {{\lambda{{\hat{\sigma}}_{y}^{2}\left( {i - 1} \right)}} + {\left( {1 - \lambda} \right){\sum\limits_{j = 0}^{P}{{\hat{Y}}_{j}(i)}}}}},} \end{matrix}$ wherein σ_(e) ²(i) is the variance of the error signal in the i^(th) frame, {circumflex over (σ)}_(n) ²(i) is the variance of the estimated noise signal in the i^(th) frame, {circumflex over (σ)}_(y) ²(i) is the variance of the estimated echo signal in the i^(th) frame, E_(j)(i) represents energy at frequency j of the frequency domain signal of the error signal in the i^(th) frame, Noise_(j)(i) represents energy at frequency j of the frequency domain signal of the noise signal in the i^(th) frame, and Ŷ_(j)(i) represents energy at frequency j of the frequency domain signal of the estimated echo signal in the i^(th) frame, wherein 0.9<λ<1, λ is a real number, 0<P≦N/2−1, P is a positive integral, and the value of P is determined by the non-linearity of the communication system.
 20. The circuit of claim 15, wherein when the double-end talk detecting device determines that double-end talk has occurred, the echo estimating adaptation process of the echo estimating coefficient is disabled. 